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The Java™ programming language, with its 
portability, object-oriented model, support for 
multithreading and distributed programming, and 
garbage collection features, is becoming the 
language of choice for the development of large- 
scale distributed applications. Without a suitable 
performance analysis tool for Java programs, 
however, it is often difficult to analyze the 
programs for performance-tuning problems. The 
profiler included in Sun's Java Development Kit 
(JDK™) 1.1 does not provide sufficiently detailed 
trace information to address performance issues 
in large applications. Also, it does not support 
the tracing of client/server applications, which 
are very important for analyzing distributed 
systems. The JaViz performance analysis tool 
generates execution traces with sufficient detail 
to determine program hot spots, including 
remote method calls, in a distributed Java 
application program. JaViz provides a graphical 
display of the program execution tree for the 
entire distributed application in the form of a call 
graph for ease of visualization. A number of 
features allow users to analyze the execution 
tree for performance-tuning problems more 
easily than other Java performance monitoring 
tools. The usability and functionality of the JaViz 
tool set is demonstrated by applying it to an 
example distributed Java application program. 



The Java** programming language presents sev- 
eral features that appeal to developers of large- 
scale distributed systems. Features such as "write 
once, run anywhere"** portability, portable support 
for multithreaded programming, support for distrib- 



uted programming, including remote method invo- 
cation, garbage collection, and an appealing object 
model have encouraged Java use for systems with a 
size and complexity far beyond small applets. Larger 
applications, however, encounter performance prob- 
lems that may never bother users of small applica- 
tions. The developers of these applications typically 
encounter four types of problems: 

• Distributed applications. Large-scale client/server 
applications distribute objects and then execute 
across multiple machines. A critical step in per- 
formance tuning for these distributed applications 
is the identification of hot spots where there is ex- 
tensive remote method invocation (RMl). 

• Inefficient methods. Methods coded for flexibility 
and generality can cause significant performance 
problems when used extensively in larger applica- 
tions. For example, developers often discover that 
an inordinate amount of time is spent in certain 
Java String package methods. 

• Synchronized methods. Synchronized methods 
force mutually exclusive access to protected ob- 
jects. Significant contention due to a synchronized 
method can cause performance to suffer greatly. 

^Copyright 2000 by International Business Machines Corpora- 
tion. Copying in printed form for private use is permitted with- 
out payment of royalty provided that (1) each reproduction is done 
without alteration and (2) ihe Journal reference and IBM copy- 
right notice are included on the first page. The title and abstract, 
but no other portions, of this paper may be copied or distributed 
royalty free without further permission by computer-based and 
other information-service systems. Permission to republish any 
other portion of this paper must be obtained from the Editor. 
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• Memory management. Sun's JDK 1.1 and Java 2 
Software Development Kit (SDK), Standard Edi- 
tion, V 1.2.2 garbage collectors appear to have been 
optimized for applications with a relatively small 
memory footprint. As a result, large applications 
can experience unacceptably large and unpredict- 
able delays during garbage collection. 

The JaViz performance visualization tool presented 
in this paper is currently designed to address primar- 
ily the first two of these problems. However, it can 
be extended to address the remaining two problems 
as well. 

Ja Viz was developed to supplement, rather than re- 
place, existing Java performance analysis tools such 
as the profiler included in Sun's JDK 1.1,' the Hy- 
perProf tool,^ the Profile Viewer tool, ^ the JProbe** 
tool,* and the Optimizelt!** tool.' Sun's JDK, for in- 
stance, provides a profiling option to count method 
invocations both individually and by caller-callee 
pairs, It also provides information about average 
method execution time. The HyperProf tool can dis- 
play these trace files as a hyperbolic tree, allowing 
users to look at the time involved in each method, 
the methods called by each method, and so forth. 
The Profile Viewer tool also uses the profile gener- 
ated by Sun's Java virtual machine (Jvm) to display 
the caller-callee relationships with timing informa- 
tion. Due to the coarse granularity of the underly- 
ing trace, however, it is not possible to determine 
whether methods have high or low execution time 
variances or whether method execution time varies 
based on the caller's context and parameters. ALso, 
it is not possible to trace execution threads at all, let 
alone across Jvm boundaries, when rmi calls are 
made. The JProbe, the Optimizelt!, and the Visual 
Quantify'' profiling tools provide powerful graphi- 
cal analyzers to identify performance bottlenecks in 
Java programs, but they do not currently support co- 
ordinated tracing of client/server activities. 

IBM's Jinsight^ is another performance visualization 
tool for Java application programs. J insight uses 
traces from a modified Java virtual machine (ibm De- 
veloper Kit for Windows**, Java Technology Edi- 
tion, Version 1.1.x and ibm Developer Kit for MX* 
[Advanced Interactive Executive], Java Technology 
Edition, Version 1.1.6) to display performance bot- 
tlenecks, object creation, garbage collection, thread 
interaction, and object references. It also supports 
a feature to identify and solve memory leaks. Jin- 
sight, however, lacks the capability of tracing 
client/server activities across multiple Jvms. Thus, 



while Jinsight is very useful for performance anal- 
ysis of Java applications running on a single Jvm, it 
cannot be used to identify performance bottlenecks 
in distributed Java application programs. 

The existing Java profiling tools do not currently sup- 
port client/server activities, making them of limited 
use in the analysis of distributed application pro- 
grams. To extend Java's existing profiling support for 
a distributed environment, JaViz focuses on a de- 
tailed method-level trace with sufficient detail to re- 
solve RMI calls between traces. Given the extremely 
large amount of data that could be traced, it is par- 
ticularly important to minimize the tracing overhead 
both through careful design and by allowing users 
to control the set of methods to trace. Finally, it is 
expected that users would be unable to directly per- 
ceive and process the large volume of data created 
by a set of method-level traces. Accordingly, JaViz 
incorporates visualization and statistical analysis 
tools to help users understand the trace results. 

Several functionality, performance, and usability re- 
quirements drove the development of this perfor- 
mance visualization tool. Requirements for function- 
ality include: 

• The ability to trace and record elapsed time for all 
method calls executed in a Java virtual machine 

• The ability to resolve a method-call trace into an 
execution tree, with callers being parents of callees. 
This tree should include all threads of execution, 
with a thread's run method appearing as the child 
of the method that invoked it. 

• The ability to build a tree of execution that spans 
multiple Jvms. This tree should be rooted at the 
call to main from one Jvm, and should include all 
local and remote method calls executed. 

• The ability to correctly build each of these trees 
from the remaining method calls when certain 
method calls are omitted for trace-generation ef- 
ficiency 

Performance requires: 

• An implementation that does not excessively dis- 
tort the performance profile of the applications be- 
ing traced 

• An implementation that allows traces to be gen- 
erated as quickly as feasible 

• User control over the methods to be traced. For 
example, users should be permitted to avoid trac- 
ing Java library methods if they are only interested 
in performance issues in their own code. 
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Figure 1 The flow of Information In the JaVIz Instrumentation and visualization process 
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Usability requires: 

• An easy-to-use tracing and trace-processing envi- 
ronment 

• Aji easy-to-use interface for exploring trace results 

The remainder of this paper describes the JaViz per- 
formance analysis tool set and its implementation 
and use. The next section describes the tools in de- 
tail, focusing on what they accomplish and how they 
work together. The following section describes the 
implementation of the tools, including the modifi- 
cations needed to the Jvm to generate traces, the al- 
gorithms for resolving RMI calls across traces, and 
the algorithms for processing traces into execution 
trees. It is followed by an example of the use of the 
system. Finally, we present conclusions and describe 
some future enhancements that are planned for the 
JaViz tool set. 



The JaViz performance visualization tool set 

JaViz provides an execution profile of a Java pro- 
gram in the form of an execution tree that shows the 
caller-cailee relationship of the method calls in the 



program, with the callers being the parents of the 
called methods in the tree. An execution tree rep- 
resents the call graph of a single client or server pro- 
gram. Thus, in a client/server environment with n 
different client or server Jvms, JaViz will generate 
n different execution trees representing the execu- 
tion profile of each client or server. The execution 
tree shows all method calls that were invoked as part 
of the corresponding client's (or server's) execution. 
Thus, RMI calls that originated from the client pro- 
gram to be executed on other server Jvms will be in- 
cluded in the client's execution tree. On the other 
hand, incoming RMI calls that executed on this Jvm 
are part of another client's execution. Hence, these 
incoming rmi calls will be excluded from its execu- 
tion tree. 

As shown in Figure 1, JaViz consists of three major 
components: an instrumented Jvm that has been mod- 
ified to trace method calls, a set of postprocessing 
tools that resolve remote method calls and generate 
statistics, and ai visualization tool. The postprocess- 
ing and visualization tools are written in the Java lan- 
guage and thus are portable across any platform that 
supports the language. 
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Java class files are executed by the instrumented Jvm 
to generate the execution traces. These trace files 
are then processed to generate the dynamic execu- 
tion tree. The first step in the postprocessing phase 
merges the traces from multiple Jvms to follow the 
RMi invocations. These merged traces are further 
processed to build the execution call-graph tree, 
which can then be displayed by the visualizer. Dur- 
ing the merge step, some performance statistics for 
each method are gathered and fed to the visualizer 
to be displayed along with each method call. The fol- 
lowing subsections provide more details about each 
of these components of JaViz. 

Instrumented Jvm. There are several difi^erent tech- 
niques for collecting run-time execution information, 
including techniques such as sampling and direct in- 
strumentation." Sampling requires the running ap- 
plication to be stopped periodically to obtain infor- 
mation on methods that are currently being executed. 
The accuracy of the information obtained through 
sampling is determined by the sampling frequency. 
A higher sampling frequency can provide more de- 
tailed information, but this greater detail comes at 
the expense of greater perturbation of the execut- 
ing program. Direct instrumentation, on the other 
hand, adds code to the Jvm to directly measure 
method execution times. This approach provides 
more precise information than sampling, since no 
execution steps are missed. However, the additional 
instrumentation code may produce greater pertur- 
bations than sampling. Nevertheless, JaViz uses di- 
rect instrumentation to provide detailed information 
about each method's execution. To minimize per- 
turbation, the Jvm was modified only to the extent 
necessary to generate enough trace information to 
visualize the execution call graph. 

As a Java program is executed by the instrumented 
Jvm, three trace files are generated, as shown in Fig- 
ure 1. The .prf file contains detailed trace informa- 
tion that records call and return time stamps for ev- 
ery method executed. Invocations of the same 
method executed under different threads are distin- 
guished from one another by their unique thread 
identifiers. The other two files record the 
client/server interaction, if any, that occurs on the 
Jvm as the program is being executed. The .dp file 
contains information about all of the outgoing RMI 
calls from the running Jvm, i.e., identifying informa- 
tion for remote methods invoked by this Jvm. The 
.svp file records information about all incoming RMI 
calls, i.e., all of the methods remotely invoked on 
this Jvm by other Jvms. For the purposes of this ex- 



planation, the .dp file is referred to as the client pro- 
file of remote methods for which the Jvm acts as a 
client, and the .svp file is referred to as the server 
profile of remote methods for which the Jvm acts as 
a server. Note that a server Jvm may also execute cli- 
ent-type functions and a client Jvm may also act as 
a server to other Jvms. 

To reduce the amount of trace data collected, a num- 
ber of filtering options are available. One option al- 
lows the user to eliminate the tracing of Java library 
calls. This option limits tracing to only the top-level 
method calls made to any Java library call. The sec- 
ond filtering option specifies a list of classes to be 
traced. Only method invocations within the objects 
of those classes and their subclasses are traced. With 
filtering, the direct caller-callee relationship among 
the method calls may not be shown in the final call 
graph if, for example, the caller method is filtered 
out. The execution tree, however, will have all de- 
scendents connected properly to their ancestor node 
even though some intermediate ancestors may be 
missing. 

Postprocessing. The information in the trace files 
generated by the instrumented Jvm needs to be fur- 
ther processed to generate the dynamic execution 
tree of the Java program in a form that facilitates 
the visualization process. This postprocessing is done 
in two separate steps. In the first step, called the 
merge step, the client and server profiles are merged 
to produce complete traces for each Java virtual ma- 
chine. During this merge process, run-time statistics 
are gathered for the method calls in the trace files. 
These statistics are subsequently displayed by the vi- 
sualizer. The tree generation step follows the merge 
step to generate the dynamic execution tree for a 
given program. 

Merge step. Once all of the client and server programs 
have completed execution, trace files from each Jvm 
involved in a distributed program are available for 
further proces.sing. The three trace files generated 
per Jvm — the detailed trace, the client profile, and 
the server profile — are merged to produce one de- 
tailed trace for each Jvm that contains all of the rel- 
evant information, including the client/server activ- 
ity. Using the client and server profiles of the different 
Jvms, the merge process tracks remote method calls 
made by any Jvm from the client-side detailed trace 
to the server-side detailed trace. At the end of this 
step, the merged detailed trace of each Jvm contains 
pointers to the merged trace files of the other Jvms 
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such that the path of every remote call from the cli- 
ent to the server can be uniquely identified. 

Tree generation. The tree generation step analyzes 
the merged trace files to create an output file con- 
taining the dynamic execution tree for a given client 
or server program. This output file is used by the vi- 
sualizer to display the call graph. As stated earlier, 
each .mrg file contains trace information for a par- 
ticular Jvm with links to other .mrg files for the re- 
mote calls made to those Jvms. To generate the dy- 
namic execution tree for a given client or server, the 
tree generation step must follow these RMi links to 
the .mrg files of the other Jvms and extract the de- 
tailed trace information related to the RMI call to be 
merged into the execution profile for the desired cli- 
ent or server. 

The tree generation step takes the .mrg file corre- 
.sponding to the desired client or server as its input. 
It processes each method call in the trace file and 
writes a corresponding record to the output .jta file. 
If the current method call being processed is an RMI, 
it goes to the appropriate .mrg file and picks up the 
corresponding traces from there. Each record in the 
.jta file, which corresponds to a single method invo- 
cation, contains a pointer to its parent method and 
a pointer to its last child method (the last method 
invoked by this method). Each child method has a 
pointer to its previous sibling method. In addition 
to the parent-child links to reflect the call graph, each 
record contains such information as the number of 
methods invoked by this method, the time when the 
method started, the time when it completed, the 
thread executing this method, the method identifier 
of the method call being represented, and the spe- 
cific Jvm on which the method is executed. The in- 
formation provided in the .jta file makes it more ef- 
ficient for the visualizer to display the call graph. To 
allow the visualizer to display the corresponding 
method names, a mapping of method identifiers to 
method names is provided in a .mth file. 

Run-time statistics generation. To faciUtate the per- 
formance analysis of the call graph, statistical infor- 
mation about each of the methods in the program 
is gathered in the merge step. Each detailed .prf trace 
file is analyzed to gather the total number of calls 
made to each method, the maximum, minimum, and 
average execution times, and the standard deviation 
of the execution time for each method. The statis- 
tics for all the methods are written in the jvm.stt file. 
The statistics are subsequently displayed by the vi- 
sualizer when a node is selected in the call graph. 



Visualizer. The visualizer is the last component of 
the jfaViz tool set. It reads the .jta output file of the . 
postprocessing step, which contains the dynamic ex- 
ecution tree of a program, and graphically displays 
it as a tree with detailed node information. The vi- 
sualizer provides the user with the ability to navi- 
gate through the tree nodes, expand and contract 
the tree, search for methods with particular at- 
tributes, and map method attributes to graphical dis- 
play parameters, such as color, size, and arc width, to 
make it easier to identify program areas of interest. 

The visualizer consists of two windows, the tree win- 
dow and the node information window, as shown in 
Figure 2. The tree window displays the graphic rep- 
resentation of the execution tree. Initially, the tree 
displays only the root node and its immediate chil- 
dren. The user then can select any child of the root 
and visualize its children, and so on out to the leaf 
nodes. At any instance, the tree window shows all 
of the open nodes and arcs with the currently selected 
node highlighted by a star pattern. Leaf nodes are 
displayed with an underscore to indicate that these 
nodes cannot be explored further. 

The node information window displays data about 
the currently selected method node. Table 1 lists the 
information displayed in this window. 

The visualizer provides the user with a collection of 
functionalities, listed in Table 2, that can be accessed 
from the tree window through mouse buttons, a 
menu bar, pop-up menus, and keystrokes. Besides 
allowing the user to highlight nodes with specific at- 
tributes, the mapper also allows the user to specify 
range queries. Instead of entering all of the differ- 
ent parameters for a query, the u.ser can simply spec- 
ify the attribute of interest. The visualizer will then 
map different values (or value ranges) of that at- 
tribute to display attributes. For instance, if the user 
wants to highlight the nodes with different thread 
identifiers using different node colors, the visualizer 
will automatically assign a new color for each iden- 
tifier. Without this feature, the user would have to 
manually create many individual specific mappings 
to have the same effect. 

Implementation details 

This section describes the implementation details of 
the various components of JaViz. Specifically, the 
modifications needed to the Jvm to gather the de- 
tailed profile traces are described, along with the 
client/server traces, the algorithms used to generate 
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Figure 2 The visualizer portion of the JaVIz tool s( 




merged traces by resolving RMl calls across Jvms, and . 
the algorithms used to process traces to build the 
dynamic execution tree. The implementation of the 
visualizer is also discussed. 

Instrumented Jvni. The profiler that comes with 
Sun's JDK 1.1.5 provides limited trace information," 
recording only the cumulative time in milliseconds 
for each caller-callee pair. Moreover, the profile data 
do not include any client/server activity. The Jvm 



used in JaViz is modified to generate more detailed 
information for each method call as well as to gen- 
erate traces of client/server programs that span across 
multiple Jvms. In particular, the Jvm is instrumented 



• Generate detailed traces of every method call for 
the program it is executing 

• Generate traces to track the client/server interac- 
tion for the program in a client/server environment 
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Table 1 Information displayed in the node information 
window 



Table 2 Functionalities provided by the visualizer to 
aid performance analysis 



-Load ' Loads a new trace file and method file 

for display 

Mapper Maps a specific data quer>' (thread 

identifier, machine name, method 
name, and so on) to a particular 
display style (node or arc color, 
shape, or size) 

Find next Finds the next node matching the 

given parameters 

Zoom Zooms in or out, vertically or 

horizontally 

Key- Displays inlormation about the 

applied mappings 

Reset Resets the root of the displayed tree 

^ to the original root 

Up one level Sets the root of the tree to the parent 
node of the current root (disabled if 
the root IS the original root) 

Open level . Displays all the nodes in the next level 

Refresh Redraws the tree 

Exit . Exits the visualizer 



Method name 
Caller name 
Machine name 
Thread number 



This method: 
Average time 



This call. Call time 
This call: Return 

This call: Total time 



Name of the method represented 

by the node 
Name of the method that called 

the current node 
Name of the Jvm on which the 

method executed • 
A number identilying ihe thread 

that executed this method 

The total number of times this 

method is railed in the 

program 
The total time this method took 

to execute over all its 

invocations in the program 
The minimum time this method 

took to execute over all its 

invocations in the program 
TTie maximum time this method 

look to execute over all its 

invocations in the program 
The average execution time of 

this method over all its 

invocations in the program 
The standard deviation of the 

execution times of this method 

in the program 
The time in microseconds when 

this method instance started 
The time in microseconds when 

this method instance 

completed 
The total execution time, in 

microseconds, of this method 



Detailed trace generation. The trace generation mod- 
ule of the Jvm is modified to record every invoca- 
tion of a method using time stamps that show the 
start and end times of the method with microsecond 
resolution. Additionally, a thread identifier is recorded 
to uniquely identify the thread executing the metliod. 
Sun's JDK 1.1.5 Jvm implementation uses a function 
called java_mon to trace method calls. If the "-prof 
profiler flag is set, the java_mon function is called at 
every method invocation and exit. JaViz uses a sim- 
ilar strategy to trace method calls by invoking its own 
profiling function instead of the java_mon function. 
The profiling function creates a new trace record in 
a buffer for each method entry or exit event. For 
faster processing, the trace information is stored in 
main memoiy and written to external files only when 
the buffer overflows. Since disk operations are time- 
consuming, it is important to minimize the number 
of writes to the external file. Consequently, the trace 
generation reduces the size of the records stored for 



the method calls by storing the time values as 32-bit 
integers. A 64-bit reference time is written on the 
external file whenever the 32-bit time ovci tlows. Fur- 
thermore, the method name associated with each 
time value is usually quite large, typically hundreds 
of Sytcs. Instead of storing the entire name, a 4-byte 
method identifier is used. To generate unique iden- 
tifiers, a hash table in main memory translates the 
method' names to corresponding identifier values. 

Since the number of methods called within a pro- 
gram can be quite large, and since each instance of 
a method generates a trace entry, the amount of trace 
data generated for a large application would be enor- 
mous. Filtering options are provided with the instru- 
mented Jvm to reduce the amount of trace data col- 
lected. When the "exclude Java library calls" filtering 
option is enabled, the Jvm checks the caller of the 
current method in the execution stack. If the caller 
belongs to a Java or Sun package, which indicates 
that the caller is a Java library method, the current 
method is not traced. The second filtering option 
specifies a list of classes to be traced. For this op- 
tion, the class hierarchy of the object on which the 
current method is invoked is checked to see whether 
it belongs to any of the classes specified. If so, the 
method is traced. Otherwise, the method invocation 
is simply ignored. 

Clientlserver trace generation. One of the unique fea- 
tures of JaViz is its ability to track a program's ex- 
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Figure 3 An example of a remote method Invocation In Java 




ecution across multiple Jvms. The Java remote 
method invocation (RMl) facility allows one Jvm 
to execute a method on another Jvm, which may be 
executing on a physically distributed processor. In 
Figure 3, Jvm 1 and Jvm 2 are Java virtual machines 
running on two different physical hosts. In this par- 
ticular case, Jvm 2 acts as a server while Jvm 1 acts 
as a client. The Factory_Warehouse instance is a re- 
mote object created on Jvm 2 that is available to 
Jvm 1. RMI invocation procedures allow the client 
Customer instance to obtain a reference to the re- 
mote Factory_Warehouse instance. 

Jvm 1 executes the method place_order within the 
Customer object. The place_order method invokes 
Warehouse_lnstance.lookup where the Warehousejn- 
stance object contains the remote reference to the 
Factory_Warehouse object. Internally, this call is 
directed to Factory_Warehouse_Class_Stub.lookup, 
where the Factory_Warehouse_Class_Stub instance is 
a client-side dummy of the remote object. The stub 
object then establishes a tcpap (Transmission Con- 
trol Protocol/Internet Protocol) connection with the 
server-side dummy, an instance of Factory_Ware- 
house_Class_Skel. The stub object on the client side 
communicates with the skel (skeleton) object on the 



server side through the object serialization protocol 
for remote invocation. ' The skel object finally in- 
vokes the remote method lookup on the actual re- 
mote instance of Factory_Warehouse and returns the 
result back to the client-side stub method. This then 
completes the RMI procedure. Note that this RMI pro- 
cedure is specific to Sun's JDK 1.1. 

To trace client/server activities through rmi, evei^ 
object to be exported to a remote Jvm is given a 
unique identifier automatically by the server Jvm. 
Similarly, each method that can be remotely invoked 
in an exported object is also given a unique (within 
a class) identifier by the rmi module. For every re- 
mote method invoked through rmi, JaViz's modi- 
fied Jvm records these identifiers at both the client 
side and the server side. For example, in Figure 3, 
the Factory_Warehouse instance may be identified as 
Object 25 on the server. This number will be unique 
on the server so that the performance analysis tool 
knows that every remote call to Object 25 is always 
directed to this Factory_Warehouse instance. Simi- 
larly, if the remote method lookup is identified as 
Method 5, the combination of Object 25 and Method 
5 will identify Factory_Warehouse.lookup on the 
server. 
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Figure 4 The client and server profile entries for the RMI call In the example In Figure 3 
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To match corresponding entries in the server and 
client profiles, the modified Jvm also records the ma- 
chine (Jvm) names. On the client side, the server ma- 
chine name on which the call was invoked is re- 
corded. On the server side, the client machine name 
from which the call originated is also recorded. The 
client-side port number of the TCP/IP connection used 
for the remote call is recorded in both the server and 
the client profiles. The port number is needed to dis- 
tinguish between remote calls made to the same 
server from different client Jvms residing on the same 
physical machine. 

The identifier of the thread that invoked the remote 
call is recorded on the client side in order to map 
the detailed trace entry of the remote method in- 
vocation to the corresponding client profile entry. 
Similarly, the identifier of the thread where the re- 
mote call is received is recorded on the server side. 
Finally, the time stamps — the time at which the re- 
mote call was invoked on the client and the time at 
which the call was received on the server — are also 
recorded. 

The trace entries in the client and .server profiles are 
shown in Figure 4 for the RMI call Warehouse_ln- 
stance.lookupC ) from the client on Jvm 1 to the server 
on Jvm 2 in the example in Figure 3. The client en- 
try indicates that it is an RMI call to Object 25 for 



Method 5 on the server running on Machine 
cslp62b.cs.umn.edu, which in this particular exam- 
ple is the Factory. Warehouse. lookup method on Jvm 
2. The trace information also records the identifier 
of the thread invoking the RMi, which is "30144624," 
the port number "4667" of the TCP/IP connection, 
and the time stamp "1022150658651." The corre- 
sponding entry in the server profile indicates that it 
is an incoming rmi call from the client on Machine 
cslnt3.cs.umn.edu through Port 4667 for Method 5 
on Object 25. Thus, the server-side record links back 
to the entry in the client profile. The entry also rec- 
ords the thread identifier "29740592," and the time 
.stamp, "1033151850258." 

Note that the time-stamp values at the client and the 
server are not the same since the clocks in a distrib- 
uted system are not guaranteed to be synchronized. 
JaViz does not depend upon having synchronized 
clocks on the various client and server machines. This 
design decision reflects the real-world likelihood that 
distributed applications, particularly those with 
shared servers, may be unsynchronized and may in- 
deed be managed separately and therefore hard to 
synchronize. 

The lack of synchronized clocks requires some ad- 
ditional processing and has only minimal conse- 
quences. The major additional processing is needed 
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because only chronological time-stamp order, not 
time stamps, is used to resolve client/server calls. In- 
deed, because of variance in network overhead, such 
times could not be uniquely used in applications with 
more than one thread or process making RMi calls 
to the same server. The technique used in JaViz re- 
solves the calls without needing a synchronized clock. 
A second consequence of unsynchronized clocks is 
the apparent lack of time continuity when tracing 
execution across RMI calls. The time displayed in the 
server might precede the call time, or follow the re- 
turn time, displayed in the client. This discrepancy 
can be addressed by rescaling times to create a "plau- 
sible global time." Thus far, this discontinuity has 
not been found to be a significant distraction, par- 
ticularly since elapsed time is displayed and each call 
has both call and return time measured on the same 
clock. A third consequence is a minor limitation on 
the ability to measure network overhead. It is pos- 
sible to measure total network overhead by subtract- 
ing elapsed time on the server from elapsed time on 
the client (this measure includes time in the RMI call 
on both ends). It is not possible, however, to break 
this down further into call and return time, since the 
server time cannot be placed within the client win- 
dow. No user has requested this ability, however, in 
part becau.se network overhead is predictable, and 
in part because it is out of the control of the opti- 
mizer (except at the granularity of RMI invocations). 

Trace generation with the Java 2 environment. JaViz 
was designed to instrument and profile applications 
running in the widely used Sun JDK 1.1 Java virtual 
machine. Java execution environments frequently in- 
clude just-in-time (jit) compilers to enhance Java 
performance. " Some changes in the JaViz trace gen- 
eration module will be needed to retain the accu- 
racy of the tool when used in such execution envi- 
ronments. JaViz already provides the proper hooks 
into RMI to trace RMI calls, whether the initiator is 
JIT compiled or bytecode interpreted. Tracing 
method calls in JaViz requires only that they not be 
compiled inUne. If JIT compilers do compile meth- 
ods inline, however, we expect declarations to be 
available to prevent the inline compilation of the par- 
ticular methods being traced. 

Sun's Java 2 SDK, Standard Edition, v 1.2.2 provides 
extended trace generation support that will allow 
JaViz to trace JiT-compiled methods. All the detailed 
tracing information currently captured by the instru- 
mented Jvm of JaViz can also be obtained through 
the new Java Virtual Machine Profiler Interface 
(JVMPI) provided in Sun's Java 2 SDK. JVMPI will 



be able to provide trace information on JiT-compiied 
methods as well as object instances — two key pieces 
of information that are missing from the current im- 
plementation of JaViz. The only problem is that the 
resolution will still be milliseconds. As we extend 
JaViz to the Java 2 SDK, we intend to use jvmpi as 
much as possible to remove JaViz's dependence on 
a modified Jvm. The tracing of client/server activity, 
however, will still need to be done by modifying the 
RMI library implementation. 

Future enhancements to Sun's RMI support will al- 
low HOP (Internet Inter-Orb [object request broker] 
Protocol) to be used as a transport protocol, giving 
interoperability with CORBA** (Common Object Re- 
quest Broker Architecture**) -based applications 
and services.'-' Since the Jvm instrumentation of 
JaViz is restricted to the underlying TCP/IP layer for 
network connection identification and higher-level 
object, method, and thread identification, the RMI 
enhancement to support HOP is not expected to have 
any impact on our client/server tracing mechanism. 

Postprocessing. The postprocessing step takes the 
detailed .prf profile of each Jvm, along with the .dp 
and the .svp files, as input. The merge and tree gen- 
eration substeps process these input files to produce 
a dynamic execution tree for the desired client or 
server. The details of these steps are described in 
the following subsections. 

Merge step. The main part of the merge process is 
to link client calls recorded in the client profile of 
a client Jvm to the appropriate entry in the server 
profile of the remote server Jvm where the method 
was actually executed. To see how the correspond- 
ing client- and server-side entries are matched, con- 
sider a simple scenario where there is one client Jvm 
interacting with one server Jvm. In a single-threaded 
application, multiple calls made from the client to 
the same remote object and method on the server 
can easily be matched using remote object and re- 
mote method identifiers recorded in the client- and 
server-profile entries. The entries with the same re- 
mote object and remote method identifiers on both 
sides can be aligned in chronological order and the 
entries matched in that order. Note that the clocks 
on the two machines need not be synchronized for 
this matching to succeed. Instead, the matching pro- 
cess relies only on the relative order of calls within 
each Jvm. Since the Jvm is multithreaded, however, 
there could be a second call made to the server from 
the client through another thread before the first call 
even begins, if Java threads are mapped to different 
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native threads in the Jvm implementation. Thus, the 
second call may arrive at the server before the first 
one.. The remote object and remote method iden- 
tifiers are not sufficient to unambiguously link the 
appropriate calls in this case. Consequently, the port 
number of the TCP/IP connection is used to resolve 
this ambiguity. 

Consider the following sample trace entries from a 
client-side Jvm (top) and a server-side Jvm (bottom). 
The entries are listed in chronological order. 

port1 :myserver:object3:method4 
port2:myserver:object3:method4 

port2:myGlient:object3:method4 
port1 :myclient:object3:method4 

In the first two entries, port1 and port2 are the client- 
side port identifiers of the TCP/IP connection. Rely- 
ing only on time ordering to match machine name, 
method identifier, and object identifier, the first en- 
try on the client side would be matched to the first 
entry on the server side. However, since their port 
numbers do not match, the first entry on the client 
side is matched to the second entry on the server 
side. Because there cannot be more than one con- 
nection active at a time through a single TCP/IP port, 
all RMI calls going through the same TCP/IP connec- 
tion are guaranteed to be time-ordered. Further- 
more, the TCP/IP port numbers active at the same 
time on the same physical machine for different cli- 
ent Jvm processes are guaranteed to be unique. Even 
though this situation ihay be rare in a single-client 
scenario, it is not unusual when there are multiple 
clients running on the same physical host interact- 
ing with the same server. 

In addition to the matching of corresponding client- 
and server-side entries, the entries in the client pro- 
file must be linked to the correct method entries in 
the detailed profile of the client Jvm. As shown in 
Figure 3, all client calls originate from an instance 
of a stub class. All method calls in any stub class are 
picked up from the detailed profile and matched with 
entries in the chronologically ordered client profile 
using the method name, the class name, and the 
thread identifier. Matching the thread identifiers is 
necessary since multiple threads could be active at 
the same time in a Jvm on a multiprocessor machine. 

The entries in the server profile also must be linked 
to the correct method entries in the detailed profile 
of the server Jvm. Server-side execution of a remote 



method is initiated by an instance of a skel class (re- 
fer to Figure 3) with the invocation of a dispatch 
method. This dispatch method in turn invokes the 
desired remote method in the remote object. All dis- 
patch method calls in any skel class are extracted 
from the detailed profile and matched with entries 
in the chronologically ordered server profile based 
on the class name and the thread identifier. Note 
that the actual method name on the detailed profile 
entry will always be dispatch. 

Tree generation. The execution traces in the .mrg files 
cannot be directly used to display the call graph since 
the trace data do not provide any explicit parent-child 
links among the method calls. However, the infor- 
mation included in the traces is sufficient to build 
the call graph. The main goal of the tree generation 
step is to make the parent-child relationships among 
the dilferent methods in a given .mrg file explicit in 
order to build a dynamic execution tree that can 
readily be used by the visualizer to display the call 
graph. The .mrg file lists the start time, end time, and 
thread identifier of each local method in time or- 
der. For remote methods, it also lists the name of 
the Jvm on which the RMi was executed. The .mrg 
file for this remote Jvm must be analyzed to build 
the portion of the execution tree corresponding to 
the remote method's execution. The input .mrg file 
may also contain traces of remote methods invoked 
by other Jvms but executed on the local Jvm. Since 
these RMI traces are part of a different client pro- 
gram's execution, they must be ignored while build- 
ing the execution tree. Thus, the tree generation al- 
gorithm deals with three main issues: 

• Making the parent-child relationship among 
method calls explicit 

• Incorporating any RMI processing performed for 
the input .mrg file that appears in other .mrg files 

• Excluding the traces for remote methods in the in- 
put .mrg file that come from other Jvms 

These algorithms are now discussed in detail. 

Determining parent-child relationships. A stack-based 
algorithm is used to determine the parent-child re- 
lationship among the method calls in the merged 
traces. Except for one special case involving new 
thread creation, a method that calls child methods 
completes its execution only after all of its called 
methods complete. Since the methods in the merged 
trace appear ordered by their start times or end 
times, a method that calls other methods encloses 
the start and end times of all its children within its 
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Figure 5 



(A) Example code to demonstrate the tree generation process. (B) Ttie corresponding entries in the .mrg 
fiie for the method calls shown in (A). "S" denotes a method's start time while "E" denotes the end time. 



(A) .public void A (){ ' (B) IVIethod Time Stamp Thread 

-■' A ; .■ ■ S100 • ■ 

voidB(){ B S150 

B(); ► - ■ < >P . S1:65 

P{); P E180 

CO; - •••• • • ■ ;■ . ■■ ■■ ■ B . ::/-.Er98 ' 

... } . C S200 

} ' r C ^ - E220 



own start and end times. Thus, a stack of methods 
is maintained where a node is pushed onto the stack 
each time its start time is found and is popped off 
the stack and written in the output file when its end 
time is encountered. The order of the methods ap- 
pearing in the .mrg file ensures that a parent method 
is not popped off before all of its child methods are 
pushed onto the stack. This ordering atso guaran- 
tees that the parent of the most recently inserted 
method (i.e., the top method) is immediately below 
the inserted method on the stack. 

This single stack of methods works correctly to de- 
termine parent-child relationship as long as the pro- 
gram has a single thread of execution. In a mul- 
tithreaded application, however, the methods of 
different threads of execution are interleaved in the 
trace. Thus, it cannot be guaranteed that the parent 
of the topmost method on the stack is immediately 
below it. However, with a separate stack for each 
unique thread of execution, the above algorithm ap- 
plies to each stack independently. Thus, the com- 
plete stack-based algorithm actually uses multiple 
stacks, one for each newly created thread, to deter- 
mine the parent-child relationships of the methods. 

In the example shown in Figure 5(A), Methods B 
and C are called by Method A and consequently be- 
come the children of Method A. Method P is the 
child of Method B. Assuming that all four methods 
belong to the same thread (Thread 1), the corre- 
sponding .mrg file will have the entries shown in Fig- 
ure 5(B). Figure 6 shows the application of the stack 
algorithm to the trace in Figure 5(B). Since the ex- 
ample has only one thread, only a single stack is 
shown. 



The stack-based algorithm assumes that a parent 
method does not complete until all its children com- 
plete. This assumption is valid for all cases except 
for one special case involving the creation of new 
threads of execution. In the Java Thread class, two 
methods are defined that are used for initiating new 
threads — the start method and the run method. To 
start a new thread object, the caller invokes the 
thread object's start method. Start then invokes the 
thread object's run method, which carries out the ac- 
tual execution of the new thread. In this situation, 
the initiating thread is the parent of start while start 
is the parent of run. However, start merely initiates 
the run method and often returns before run even 
begins executing. As a result, the previously described 
stack-based algorithm cannot determine that start is 
the parent of run since start will be popped off the 
stack before run is even pushed on to the stack. 

To remedy this problem, the start methods are put 
into a special queue and linked to their correspond- 
ing run methods. When the start time of a start 
method is encountered, it is inserted in the queue, 
in addition to the other stack processing. When the 
start time of a run method is encountered, it is pushed 
on to its newly created stack, since this is a new thread 
of execution. Furthermore, instead of the preced- 
ing element link, it is linked to the head element of 
the start queue as its parent. Thus, even if start fin- 
ishes before its corresponding run and so is popped 
off the stack, the parent-child relationship is still 
maintained since the queue holds the parent. 

The following is a portion of an execution trace where 
Method A, in Thread 1, spawns a new thread T 
(Thread 2), by invoking T.start( ). T.start( ) subse- 
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Figure 6 Application of the stacii-based algoritlim for tree generation to the example In Figure 5 




1 . push A Z push 8 3. push P 4. pop P 9. pop B 6. push C 7. pop C 6. pop A 



quently invokes T.run( ), which creates the new 
thread. 

A SlOO 1 

T.start S109 ] 

T.start El. 15 1 

B S145 1 

B E167 1 

T.run S204 2 

C S220 2 

C E254 2 

T.run E300 2 



Figure 7 .shows how the stack-based tree generation 
algorithm handles this special case of linking run 
methods to corresponding start methods when the 
start returns before run even begins. Since the ex- 
ample deals with two different threads, the process- 
ing uses two stacks, one for Thread 1 and the other 
for Thread 2. 

In a Java application program, execution usually be- 
gins with the invocation of the main( ) method, which 
in turn invokes other methods. Thus, the call graph 
of such an application program would have main as 
the root of the dynamic execution tree with other 
methods spawning from main. However, the Java 
run-time system often invokes a number of meth- 
ods that are not spawned from n^ain or from any 
methods within the program itself. Also, in a 
client/server environment, it is possible to have a 
method spawned by a client in the server process. 
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A tree rooted at main will not include such methods 
since they are not spawned by main. The concept of 
the virtual wot, which is the root node of any dynamic 
execution tree, is introduced to handle such cases. 
This node ties together main and any other methods 
that are spawned by the run-time system or by a dif- 
ferent client or server. 

RMI processing. For RMi calls invoked by the Input 
client program, the merged trace for a particular Jvm 
contains an entry with a link to the remote Jvm that 
executed the rmi. This entry, which is identified by 
a stub method call, contains the server machine 
name, the remote object name, the remote method 
name, the identifier of the thread that executed the 
remote method, and the RMI time stamp. The server 
machine name can be used to identify the corre- 
sponding .mrg file that needs to be processed. Within 
the remote trace file, the appropriate RMl entry is 
identified by matching the remote object name, the 
remote method name, the thread identifier, and the 
time-stamp values obtained earlier from the client 
profile. The rmi traces are collected from the remote 
file until the end time stamp of the remote method 
is encountered. However, any methods that started 
before the rmi began and finished before the RMI 
returned are not included since these are part of the 
profile of the server executing the RMI call. 

Figure 8 shows the .mrg file entries for an rmi call 
from a client on Machine 1 to a server on Machine 
2. The RMI call on the client machine, which is iden- 
tified by the StockApplet.Stub.update call, is deter- 
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Figure 7 Application of tlie staci(-based aigorittim to handie tlie start and run mettiods used to initiate a new tliread 
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mined to be served by the StockApplet_Skel.dispatch 
method on Machine 2. Thus, the tree generation pro- 
cess jumps to the .mrg file of Machine 2 and finds 
the appropriate RMi entry, which is identified by the 
Stocl<Applet_SI<el.dispatch call and by matching the 
thread identifier and the time stamp. Once in this 



file, it processes all entries until the end time stamp 
of StockApplet_Skel.dispatch (Method 1664) is found. 
However, Method A on Machine 2 started before 
Method 1664 and, hence, cannot be part of the RMi. 
Thus, when the RMi traces are processed, the entry 
corresponding to Method A is ignored. 
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Figure 8 



(A) The .mrg file entries for a ciient running on IMachine 1; (B) tiie .mrg file entries for a server running on 
lUlachine 2 



METHOD IDENTIFIER OF STUB CHJL 



StockApplet_Stub.update 410 



METHOD THREAD 

> 410 S440 100 <Maclilhe2>: Stocl<Applet_SI<el:dispatch:200:<timestamp> 
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Excluding traces of other programs. In the input cli- 
ent trace file, there may be entries for RMI traces, 
identified i^y skel method entries that were remotely 
executed on this Jvm. Since these traces are part of 
an RMl call for another ciient, they belong to the ex- 
ecution tree of that other client program. Hence, they 
should be excluded when building.the execution tree 
for the current program. However, there may be 
method calls that started after the execution of the 
RMI but did not end before the RMi returned. These 
methods are part of the current profile and so should 
be included. Thus, when a skel method call is en- 
countered, the method calls that started before, but 
finished within its boundary, are included in the call 
graph. The other method calls within the boundary 
of the skel call are excluded. In the example shown 
in Figure 8, if Machine 2 is the client being processed, 
then the entries corresponding to the RMI call within 
the boundary of Method 1664 will be ignored. How- 



ever, Method A, which started before the RMI but 
completed within its boundary, must be included as 
it is part of the input client. 

Run-time statistics generation. Each detailed .prf trace 
file is analyzed to generate statistical information for 
each method. The statistical data are stored in a hash 
table keyed by method identifiers for faster access. 
The statistics generation step reads the entries in the 
.prf file and updates the hash table entry for the cur- 
rent method. The number-of-tlmes-called value is in- 
cremented each time a method entry is encountered. 
The maximum and minimum execution time values 
are updated with the current execution time values, 
if needed. For the average and standard deviation 
calculation, the appropriate execution time values 
are accumulated. Once all of the entries are processed, 
the average and the standard deviation of the execu- 
tion time for each method are calculated and stored 



110 KAZI ET AL. 



SYSTEMS JOURNAL. VOL 39. NO 1. 2000 



Figure 9 Propagation of RMI calls In the example program 



Client : remoteFoo{ ) ► Server 1 : remoteFoo( ) —. — ► Server 2... ■ » Server n : termination 



in the hash table. Finally, the statistics are copied from 
the hash table to the jvm.stt file (see Figure 1). 

Visualizer. The visualizer takes the .jta file and the 
corresponding .mtti file as its inputs. Since it processes 
the execution tree in the .jta file in a top-down man- 
ner, it needs to retrieve the root node first. The root 
node, which is the virtual root that ties together all 
of the first level calls, is stored at the end of the .jta 
file since it is the last method to finish. Thus, the vi- 
sualizer opens the .jta file and jumps directly to the 
last record in the file to retrieve the root node in- 
formation. Each node in the .jta file contains point- 
ers to its last child and its previous sibling, and a count 
of the number of children it has*. After reading this 
root node, the visualizer knows the location of the 
last child of the root and the total number of chil- 
dren it has. The visualizer then initiates a loop to 
jump from child node to child node following the 
previous-sibling links. While traversing the nodes, it 
gathers all the pertinent information for each node. 
Once all the children of the root are traversed, they 
are displayed. 

The visualizer initially displays only the first level of 
the tree since the .jta file may contain a large num- 
ber of nodes. Traversing all of them while keeping 
each node's information in memory could be quite 
time consuming and may overload the memory. Once 
the root and its children are displayed, the visualizer 
allows the user to expand and explore the tree by click- 
ing on the displayed nodes or by using a keyboard nav- 
igation technique developed for this purpose. 

To minimize the I/O overhead between the visual- 
izer and the .jta and .mth files, flags are added to the 
nodes to differentiate, for instance, between a node 
that is not loaded in memoiy and a node that is al- 
ready loaded but not yet displayed. In the latter case, 
the flag will indicate that an I/O to the .jta file is not 
necessary if the user clicks on its parent to open the 
node. This feature thus saves processing time. 



The visualizer's mapper function offers the user the 
ability to manually construct queries through a con- 
trol panel. The visualizer maintains a list of queries, 
to be processed, in memory. When the user enters 
a query through the control panel, it is added to this 
query list. When a query is applied, the visualizer 
scans the list of queries to check whether a query 
matches the node's data. If a match is found, the node 
is displayed with the parameters specified by the 
query. The process is repeated for each node until 
all matching nodes are displayed. 

The visualizer also provides a "find-next" capability 
to allow the user to search through the dynamic ex- 
ecution tree for a specific node matching some given 
characteristics, using either a depth-first or a breadth- 
first search to locate and select the nodes. This func- 
tion moves from node to node comparing the node's 
data with the given query and stopping when it lo- 
cates a node that matches the user's request. It re- 
sumes when the user again clicks the "find-next" but- 
ton. The find-next function can start from either the 
current node or from the root node. When the cur- 
rent node is selected as the starting point, only the 
subtree below the node is searched. Otherwise, the 
entire tree is searched. 

Example 

An RMl-based example Java client/server program 
is used to demonstrate how JaViz works. The pro- 
gram uses multiple servers with each server invok- 
ing a remote call to the next server. As shown in Fig- 
ure 9, a client initially invokes a remote method 
(remoteFoo) on Server 1. Server 1 in turn invokes 
the remote method on Server 2. ITiis process con- 
tinues until the remote call reaches a termination 
server. While this example is overly simplified, it 
clearly demonstrates the possibilities of more com- 
plex visualizations. 

The example Java code consists of two classes — Bea- 
conClient and BeaconServer. BeaconServer contains 



IBM SYSTEMS JOURNAL, VOL 39, NO 1, 2000 



Figure 10 Portion of the client's call graph as displayed by the visualizer of JaVIz 




Figure 11 



Data on the UnicastRef.lnvoke( ) method call In the node information window of the visuaiizer 



Load I Piviapper | ITpintf Next^l |^.;ZQ<^g |^K^J| |t ".tesj/Tl One-Level Up 



Method name :sun/rmi/server/UnicastRef.invol<e(Ljava/rmi/server/RemoteCall;)V 
Caller name :BeaconServer Stub.remoteFoo(ll)V 
IWachine name :KRYPTON ~ 
Thread number :1 

This Method ; This Call : 

# oF Calls :0 Call time :0: 00: 22. 647 

Total Time :0: 00: 00. 000 Return time :0: 00: 38. 439 
Minimum Time : 0: 00: 00. 000 Total time : 0: 00: 15. 792 

ximum Time ;0: 00: 00. 000 
Average Time : 0: 00: 00. 000 
Standard Dev : 0: 00: 00. 000 

Mr 'iHimil. 'llliM laia ^ iw.fr. . i 



Figure 1 2 Data on the BeaconSenerJSkel.dlspatcM ) method call In the node Information window of the visuaiizer 



Method name :BeaconSe.'-ver_Skel.dispatch(Ljava/rmi/Remote;Ljava/rmi/server/RemoteCali;iJ)V 
Caller name :sun/rmi/server/UnicastRef.invoke(Ljava/rml/server/RemoteCail;)V 
Machine name :KRYPTON 
Thread number :2 

This Method : This Call : 

# of Calls :0 Call time :0: 09: 28. 189 

Total Time :0: 00: 00. 000 Return time :0: 09: 43. 091 
Minimum Time: 0: 00: 00. 000 Total time : 0:00:14.901 

Maximum Time :0: 00: 00. 000 
Average Time : 0: 00: 00. 000 
Standard Dev : 0: 00: 00. 000 



Step 1 simply starts the RMI registry service. Step 2 
creates a BeaconServer instance named si that for- 
wards all calls to the BeaconServer instance named 
s2. Server .y2 is set to forward all calls to BeaconServer 

instance s3 in Step 3. Step 4 creates s3 and marks 
it as a termination server. Finally, Step 5 executes 
the BeaconClient instance to start the remote call on 
^7. Note that the modified JaViz Jvm and RMI reg- 
istry are used to execute the Java code. The -urn op- 
tion on the Java command line enables the trace gen- 
eration by the instrumented Jvm. 

After the completion of Step 5, four .prf files are gen- 
erated, one for the client and one each for the three 



servers. These .prf files are then merged to create 
the corresponding .mrg files. To visualize the call- 
graph profile of the client process cllenti, the exe- 
cution tree must be generated by applying the tree 
generation step on the client's merged trace file 
clientl.mrg. The tree generation step creates the 
clienti .jta file, which contains the execution tree rep- 
resentation of the client process. Once the tree is 
generated, it is ready for visualization. 

Loading the clienti .jta file into the visuaiizer displays 
the call graph for the client. Figure 10 shows the top 
portion of the call graph as displayed by the visu- 
aiizer. The root of the tree, as discussed earlier, is 
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Figure 13 Portion of the call graph for Server 1 




a virtual node that is used to tie together the roots 
of all of the call trees generated by a particular Jvm. 
Below the virtual root is the node representing the 
method maln( ). This node has four nodes stemming 
from it representing four method calls invoked by 
main. The first two deal with RMI security. The third 
calls the Naming. lookup method to find the server and 
initialize the appropriate stub. 

The fourth method invoked by main( ) is the remote- 
Foo( ) method, which is the currently selected node 
in Figure 10. The method name field in the node in- 
formation window indicates that this method is in- 
voked on the BeaconServer_Stub object. The Bea- 



conServer_Stub object itself invokes several other 
methods that set up the RMI connection to the server. 
The fifth method called by remoteFoo( ) is the Uni- 
cast.invoke( ) method; its node information is shown 
in Figure 11. This method creates the actual invo- 
cation on the remote machine. Following the call 
graph down leads to the child node of Unicast.invol<e 
named BeaconServer_Skel.dispatch( ), which is the 
node corresponding to the method call in Figure 12. 
This node contains a different thread number than 
its parent since the method was invoked on a sep- 
arate Jvm running a different thread. This difference 
can be seen by applying the "Jvm ID" option in the 
visualizer's mapper, which results in method calls on 
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different Jvms to be displayed in different colors, as 
shown in Figure 10. 

The dispatch method calls four methods to obtain 
the parameters that are passed to it. It then invokes 
the BeaconServer.remoteFoo method located on the 
current server. The remoteFoo method eventually 
calls the BeaconServer_Stub.remoteFoo, which is the 
remoteFoo method of the next server. This process 
is repeated two more times until it eventually reaches 
the last (i.e., termination) server. A total of four Jvnis 
are used, one for the client and one for each of the 
three servers. These four unique Jvms can be dis- 
tinguished by the four different node colors used to rep- 
resent them in the full call graph (not shown here). 

To visualize how the RMI calls are handled on the 
server side, we need to apply the tree generation step 
on the merged trace file of the particular server (e.g., 
sl.mrg) and then load the corresponding .jta file in 
the visualizer. In the example used, each server 
merely executes the RMI call that was initiated by 
the client and then forwarded to each successive 
server. Thus, most of each server's execution be- 
comes part of the client profile. The call graph for 
a server itself contains two method calls correspond- 
ing to the RMI in addition to the execution tree un- 
der its main( ) method. 

Viewing the first server's execution tree (correspond- 
ing to the file si .jta) shows the calls handled by the 
server for the client's RMI call. The server call graph 
is shown in Figure 13. The first two nodes correspond 
to the client's rmi call. Notice that each contains its 
own separate thread number, which is shown as a 
different color in Figure 13. The third node is the 
main( ) method executed when the server is first 
started. The RMI call was invoked by the client on 
the server and, hence, the nodes related to the RMI 
call are shown as the children of the virtual root and 
not as part of main( )'s tree. The ordering of the called 
methods (from left to right) under a caller in the call 
graph is determined by the time the callee finishes. 
Even though maln( ) started before the methods re- 
lated to the RMI began, the RMI calls finished before 
maln( ). Thus, the RMI nodes appear before main( ) 
in the call graph. 

Conclusion 

The JaViz performance analysis tool has been de- 
veloped primarily to address the performance tun- 
ing problems encountered when developing large- 
scale distributed Java application programs. In 



addition to generating traces for each individual 
method instance in a program, Ja Viz has the unique 
ability to trace multiple threads of execution and 
method calls (i.e., RMis) that span multiple Jvms. Be- 
sides identifying program hot spots, the "per in- 
stance" traces allow program developers to deter- 
mine whether method execution times have high or 
low variances and whether method execution times 
vary with the caller's context and parameters. The 
traces for different threads provide a means for an- 
alyzing multithreaded applications, while RMI traces 
make it possible to identify hot spots in client/server- 
based applications. Thus, JaViz appears to be very 
useful as a performance analysis tool for the devel- 
opment of large-scale client/server-ba.sed Java ap- 
plications, such as those built on the ibm SanFran- 
cisco* framework. ''' JaViz provides an easy-to-use 
interface with some useful capabilities, such as ap- 
plying specific queries and finding method calls with 
specific parameters, to analyze the execution tree. 
It also provides the u.ser with the ability to control 
what method calls are to be traced through a filter- 
ing interface. 

We are currently enhancing the features available 
in JaViz to make it more powerful. To provide users 
with more control over what to trace, we are design- 
ing other filtering options, .such as specifying what 
not to trace (as well as what to trace) and specifying 
wild-card options, such as "trace all com.ibm.sf.gf." 
method calls." With more filtering options available, 
we can further reduce the size of the trace data gen- 
erated. This reduction improves the processing time 
of later steps in the JaViz tool set to reduce the per- 
formance perturbation introduced by JaViz. We also 
plan to add a visual interface to JaViz that will allow 
the users to specify tracing options more easily. 

To speed up the trace generation process, we will 
compress the trace file size by writing the trace out- 
puts in a binary format. Currently, the traces are con- 
verted to text format before being written to the out- 
put file. This text conversion requires a substantial 
amount of CPU time, \yhich causes perturbations to 
the execution of the application program. Using a 
binary format for the traces will reduce the amount 
of perturbation. While we do not have detailed in- 
formation about the perturbation overhead of JaViz, 
nor the time required to generate and visualize a 
trace, we do have some anecdotal experience. In one 
instance, for example, we found that a trace file of 
17 megabytes required approximately 30 minutes to 
process and display on a Pentium** II processor, with 
Windows NT, Version 4.0, running at 400 megahertz. 
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Additional information will be incorporated into the 
visualizer window to enhance the visualization pro- 
cess. For example, a feature that would be useful in 
a large tree is to pop up the call graph for a selected 
node so that the user can readily visualize the path 
taken from the root to reach the current method call. 
This feature is planned for the next version of JaViz. 
With further enhancements, we hope to broaden the 
utility of JaViz as a performance analysis tool for 
large-scale Java application program development. 

A preliminary version of the J a Viz performance anal- 
ysis tool can be downloaded for experimental pur- 
poses. " Using this version, the largest program that 
we have attempted to trace consisted of approxi- 
mately 130 unique methods with 521410 unique in- 
stances of these methods. This program is a 
client/server application for stock updates. There is 
a single sei"ver and one or more clients. Each client 
application requests the server for a stock update 
notification. The server sends the update informa- 
tion back to the requesting client for display. The 
execution time for this program increased by a fac- 
tor of approximately 1.5 due to the overhead of the 
tracing process. The merging and postprocessing 
steps required about 30 minutes before the final trace 
could be visualized. 
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